Fault Diagnosis for Rolling Bearing of Combine Harvester Based on Composite-Scale-Variable Dispersion Entropy and Self-Optimization Variational Mode Decomposition Algorithm

Because of the influence of harsh and variable working environments, the vibration signals of rolling bearings for combine harvesters usually show obvious characteristics of strong non-stationarity and nonlinearity. Accomplishing accurate fault diagnosis using these signals for rolling bearings is a challenging subject. In this paper, a novel fault diagnosis method based on composite-scale-variable dispersion entropy (CSvDE) and self-optimization variational mode decomposition (SoVMD) is proposed, systematically combining the nonstationary signal analysis approach and machine learning technology. Firstly, an improved SoVMD algorithm is developed to realize adaptive parameter optimization and to further extract multiscale frequency components from original signals. Subsequently, a CSvDE-based feature learning model is established to generate the multiscale fault feature space (MsFFS) of frequency components for the improvement of fault feature learning ability. Finally, the generated MsFFS can serve as the inputs of the Softmax classifier for fault category identification. Extensive experiments on the vibration datasets collected from rolling bearings of combine harvesters are conducted, and the experimental results demonstrate the more superior and robust fault diagnosis performance of the proposed method compared to other existing approaches.


Introduction
As a widely used agricultural machinery, the combine harvester plays an essential role in the automatic process of crop harvesting [1]. The rolling bearing is a fundamental and an important load-carrying component in the combine harvester and has a significant influence on the stable and reliable operation of equipment [2]. Considering the harsh and variable operating environment, several different types of faults will gradually occur in key parts of the bearing and may lead to serious security incidents if they cannot be treated in a timely manner [3,4]. Therefore, accurate bearing fault diagnosis is of great importance to ensure a continuous and healthy operation state of a combine harvester and has gained much more attention for its significant value of research.
Recently, with the development of machine learning technology, various fault detection and diagnosis approaches have been proposed and have achieved successful applications in engineering practice. In Reference [5], a decentralized SVDD-based fault diagnosis method was presented and the experimental results demonstrated the feasibility of the method. To facilitate the detection of incipient faults, Zhao designed an auxiliary input signal for active fault diagnosis [6]. Generally, the vibration signals of the rolling bearing contain a large amount of information that reflects the actual health states and usually serve as the data inputs of the fault diagnosis model [7]. However, because of the influence of complex working conditions and the system dynamic response, the collected vibration signals present significant characteristics of strong non-stationarity and nonlinearity in most cases [8]. For this reason, various advanced time-frequency signal analysis methods have been applied successfully for signal processing to reduce the complexity of the original signal and further learn useful state information for diagnosis [9]. On this foundation, the redundant information contained in raw signals can be effectively filtered and the corresponding results of signal analysis provide solid support for subsequent feature extraction. The most frequently used time-frequency analysis approaches include the wavelet transform (WT), empirical mode decomposition (EMD), and a set of related improved methods, such as the empirical wavelet transform (EWT) and ensemble EMD (EEMD) [10][11][12][13]. Liang used a WT-based method to extract the fault features and realize the fault diagnosis of a bearing [10]. In Reference [11], an improved EWT approach was presented for bearing diagnosis. Although WT and its improved methods have been widely applied in the fields of diagnosis, the problems of insufficient adaptability and wavelet-based selection still cause some restrictions in practical application. Unlike WT, EMD and its optimized methods have been developed and have effectively achieved adaptive performance [12,13]. Nevertheless, a problem not to be ignored is that the mode mixing phenomenon that exists in the results of signal analysis may cause an obvious decrease in final diagnosis accuracy. Fortunately, as an adaptive algorithm, VMD helps to extract a series of frequency components from the original signal and, at the same time, avoid the influence of mode mixing [14]. The algorithm can determine the relevant bands adaptively and perfectly balance errors between different frequency components to obtain the separation of components from original signals. Zhou proposed a VMD-based method for bearing fault diagnosis and achieved better performance than existing diagnosis technology [15]. In addition, the effectiveness of VMD has also been strongly validated and many successful applications in practical problems can be seen in References [16,17]. Nevertheless, it is worth noting that two important parameters of the VMD algorithm including the number of components and the penalty factor are usually set randomly, which will have a significant influence on the final decomposition results [18]. For this reason, adaptively optimizing these parameters of VMD has become a hot topic in the fields of non-stationary signal processing and fault diagnosis.
Based on the preliminary results of signal analysis, the state features of the bearing can be further captured for diagnosis [19][20][21][22]. Zhao proposed a bearing multi-fault diagnosis method guided by the instantaneous fault characteristic frequency extraction and enhanced instantaneous rotational frequency matching, and the related experiment results validated the effectiveness of the method [19]. Wang developed a bilayer convolutional transfer learning neural network with better generalization performance to effectively extract the fault features [20]. In Reference [21], the bearing fault type was determined directly using the fault characteristic frequency and rotational frequency harmonics. Recently, due to the development of artificial intelligence technology and system dynamics theory, entropy-based feature extraction approaches have become a hot topic to be explored gradually [23,24]. As a transcendental and important statistical concept in many disciplines, entropy can effectively measure the dynamic changes of vibration signal when a fault occurs without the linear hypothesis. Because of the significant advantages, different types of entropies have been developed to automatically capture the effective features from raw signals for fault diagnosis [25,26]. For instance, approximate entropy (AE) was proposed to describe the underlying deterministic changes and further measure the dynamic changes in original signals [27]. Based on this, an AE-based model was proposed to identify the spall-like fault [28]. Limited by the theoretical basis of AE, the quality of the obtained features can be easily affected by the signal length and more similarity would be generated so that the diagnosis cannot give satisfactory results [25]. As the improvements of AE, sample entropy (SE) [29] and permutation entropy (PE) [30] were developed and effectively overcame the limitations of AE. Gao proposed an SE-based method to complete the task of early fault diagnosis of bearings [31]. In Reference [32], a bearing fault feature space was constructed using PE theory. However, some inherent defects for these entropies, such as the problems of boundary discontinuity and amplitude information loss, would also have a significant impact on the final diagnosis results [33,34]. Unlike the abovementioned entropies, dispersion entropy (DE) has an excellent ability for measuring the irregularity of signals and solves the problems existing in these entropies [35]. Due to the ideal robustness and computation efficiency, DE has been widely applied in the fields of fault diagnosis [36][37][38]. But it should be noted that these different types of entropies are all single-scale analysis methods, without taking the dynamic characteristics in multiple scales into consideration [39]. Specifically, some essential state information may be contained in these features. Considering the advantages of DE, it is a valuable subject where a novel multi-scale feature extraction method integrated with DE should be provided to achieve the goal of accurate fault diagnosis.
The main contribution of this work is the development of a novel rolling bearing fault diagnosis method for a combine harvester based on composite-scale-variable dispersion entropy (CSvDE) and self-optimization VMD (SoVMD) algorithms, systematically blending the nonstationary signal analysis technique and machine learning technology. The block diagram of the proposed method is depicted in Figure 1. In general, the implementation of the proposed method can be divided into three stages, including the first-stage multiscale frequency component extraction, the second-stage MsFFS construction, and the third-stage fault state identification. And on the subject of detail, to eliminate the influence of the strong non-stationarity and nonlinearity of original signals on the diagnosis results, an improved SoVMD algorithm is developed to decompose the original vibration signal into several multiscale frequency components first. It can be seen as a remarkable improvement on the basis of the traditional VMD described in References [14][15][16] so that the parameters of VMD can be adaptively optimized by the SoVMD method. Subsequently, a new CSvDE-based model is designed to construct the multiscale fault feature space (MsFFS) of frequency components. Compared with other entropy-based feature extraction approaches introduced in References [ [33][34][35], the constructed MsFFS effectively integrates the advantage of variable scales of CSvDE and has great potential to reveal essential information of different fault states. Finally, based on the MsFFS, the softmax classifier is used to identify the fault states of bearings. Overall, the multiscale frequency component extraction can be regarded as the preprocessing stage of the subsequent MsFFS construction, and the acquired MsFFS of bearings should serve as the inputs of the softmax model. Extensive experiments on the vibration datasets collected from rolling bearings of the combine harvester are implemented and the experimental results demonstrate the more superior and robust fault diagnosis performance of the proposed method compared to other existing approaches. were developed and effectively overcame the limitations of AE. Gao proposed an SEbased method to complete the task of early fault diagnosis of bearings [31]. In Reference [32], a bearing fault feature space was constructed using PE theory. However, some inherent defects for these entropies, such as the problems of boundary discontinuity and amplitude information loss, would also have a significant impact on the final diagnosis results [33,34]. Unlike the above-mentioned entropies, dispersion entropy (DE) has an excellent ability for measuring the irregularity of signals and solves the problems existing in these entropies [35]. Due to the ideal robustness and computation efficiency, DE has been widely applied in the fields of fault diagnosis [36][37][38]. But it should be noted that these different types of entropies are all single-scale analysis methods, without taking the dynamic characteristics in multiple scales into consideration [39]. Specifically, some essential state information may be contained in these features. Considering the advantages of DE, it is a valuable subject where a novel multi-scale feature extraction method integrated with DE should be provided to achieve the goal of accurate fault diagnosis.
The main contribution of this work is the development of a novel rolling bearing fault diagnosis method for a combine harvester based on composite-scale-variable dispersion entropy (CSvDE) and self-optimization VMD (SoVMD) algorithms, systematically blending the nonstationary signal analysis technique and machine learning technology. The block diagram of the proposed method is depicted in Figure 1. In general, the implementation of the proposed method can be divided into three stages, including the first-stage multiscale frequency component extraction, the second-stage MsFFS construction, and the third-stage fault state identification. And on the subject of detail, to eliminate the influence of the strong non-stationarity and nonlinearity of original signals on the diagnosis results, an improved SoVMD algorithm is developed to decompose the original vibration signal into several multiscale frequency components first. It can be seen as a remarkable improvement on the basis of the traditional VMD described in References [14][15][16] so that the parameters of VMD can be adaptively optimized by the SoVMD method. Subsequently, a new CSvDE-based model is designed to construct the multiscale fault feature space (MsFFS) of frequency components. Compared with other entropy-based feature extraction approaches introduced in References [ [33][34][35], the constructed MsFFS effectively integrates the advantage of variable scales of CSvDE and has great potential to reveal essential information of different fault states. Finally, based on the MsFFS, the softmax classifier is used to identify the fault states of bearings. Overall, the multiscale frequency component extraction can be regarded as the preprocessing stage of the subsequent MsFFS construction, and the acquired MsFFS of bearings should serve as the inputs of the softmax model. Extensive experiments on the vibration datasets collected from rolling bearings of the combine harvester are implemented and the experimental results demonstrate the more superior and robust fault diagnosis performance of the proposed method compared to other existing approaches.  The rest of the work can be briefly introduced as follows. Section 2 describes the research methodology and the general procedure of the proposed method. In Section 3, the experimental results and the corresponding discussion are presented in detail. At the end of the paper, the conclusions are given in Section 4. VMD is an adaptive non-stationary signal processing method [14]. The k-th mode component using VMD can be defined as where A k (t) and θ k (t) are the instantaneous amplitude and phase, respectively, and K represents the number of mode components. In order to estimate the optimal bandwidth of mode c k (t), a constrained variational model can be established as [14] min where φ k is the center frequency, ∂ t is the gradient operation, δ(t) is the Dirac function, * represents the convolution operator, j is the imaginary unit, and s(t) is the raw signal.
To obtain the solution of Equation (2), a penalty factor α and Lagrangian multiplier λ should be introduced as Then, the alternate direction multiplier algorithm is considered to solve Equation (3) and the mode c k (t) can be acquired aŝ where n is the number of iterations, andĉ n+1 k ,ŝ(φ),ĉ i (φ), andλ(φ) are the Fourier transforms of c n+1 k (t), s(t), c i (t), and λ(t), respectively.

The Developed SoVMD Algorithm
In order to eliminate the influence of strong vibration signal nonlinearity on the diagnosis result, a self-optimization VMD (SoVMD) algorithm is designed to extract the multiscale frequency components from the original signal without the problem of mode mixing and further contributes to learning the inherent characteristics of bearing fault patterns from different scales. More specifically, from the perspective of parameter optimization, the developed SoVMD adopts a hierarchical search structure with adjustable step sizes and effectively achieves the goal of adaptive parameter search (including the number of components K and the penalty factor α), which is a significant improvement compared to the traditional VMD method. The detailed steps of SoVMD can be summarized as follows.
Step 1: Initialize the parameters of the VMD method, including the frequency component c 1 , the center frequency φ 1 , and the Lagrangian multiplier λ 1 , and the search intervals of K and α should be pre-determined.
Step 2: Initialize the parameters related to the searching process, i.e., the maximum number of iterations M, the population size of searching particles P, the initial step size SS 0 , and the initial searching location (X 0 , Y 0 ). Step 3: The location X j , Y j of the j-th particle can be adjusted with a random direction as with where i represents the current iteration number, 1 ≤ i ≤ M, and SS i represents the step size of the i-th iteration. The construction of Equation (7) realizes the dynamic optimization of search step size and further improves the efficiency and accuracy of parameter searching.
Step 4: The distance between the particle location and the coordinate origin (0, 0) can be obtained as Step 5: Calculate the concentration value of each particle Cv j using the concentration judgment function, i.e., the fitness function, as where R represents the root-mean-square error of the training samples.
Step 6: Based on the search process among the whole population, the optimal concentration value Cv op and the optimal particle location X op , Y op are acquired and updated as Step 7: Steps 2-7 should be repeated with i = i + 1 until the decision condition i = M is met.
Step 8: Through the steps above mentioned, the optimal values of K and α can be obtained. Similar to the implementation process of the VMD method depicted in Reference [14], the raw signal s(t) can be decomposed into K components with different frequency scales: It is worth noting that the whole process of parameter optimization in the proposed SoVMD method can be primarily divided into two stages, i.e., the initial stage with the larger search step sizes and the latter stage with the smaller sizes. Specifically, the large size contributes to accelerate the convergence and strength of the global optimization performance, and the small size can be considered for the purpose of accurate local search. Consequently, because of the dynamic adjustable step size adopting in SoVMD approach, the divergence between the global and local optimization can be effectively balanced and the efficiency and accuracy of parameter optimization can be significantly improved. The flowchart of the proposed SoVMD algorithm can be depicted in Figure 2.

Dispersion Entropy
As a non-linear characteristic indicator, dispersion entropy (DE) can be used to evaluate the irregularity and uncertainty of the signal sequence quantitatively [35]. Based on the advantages of low computation consumption and taking the amplitude's order and

Dispersion Entropy
As a non-linear characteristic indicator, dispersion entropy (DE) can be used to evaluate the irregularity and uncertainty of the signal sequence quantitatively [35]. Based on the advantages of low computation consumption and taking the amplitude's order and relationship with the theoretical system into consideration, DE helps to obtain more reliable and robust diagnosis results. Given a signal sequence X(t) = {x(1), x(2), · · · , x(N)} (N is the length of the sequence), the DE of this sequence can be calculated by the following steps.
(1) For the original sequence X(t), a corresponding mapped sequence U(t) = {u(1), u(2), · · · , u(j), · · · , u(N)} can be constructed based on the following formula: where µ is the expectation and σ is the standard deviation. Specifically, the value of u(j) is between 0 and 1.
(2) Then, for the element u(j), an integer v m (j) between 1 and m can be obtained by a linear model: where m represents the number of categories and round(·) represents the integer function.
(3) Based on the above equation, an embedding sequence v γ,m (j) can be defined as follows: where γ is the embedding dimension and ξ is the time delay. Especially, each element of v γ,m (j) can be mapped into a dispersion mode Entropy 2023, 25, 1111 7 of 21 (4) Calculate the relative frequency of each mode by the following formula: (5) Based on the definition of Shannon entropy, the DE of sequence X(t) can be calculated as

The Proposed CSvDE Theory
Based on the extracted multiscale frequency components, the valuable and inherent features should be learned from these components for accurate fault diagnosis. Considering the complexity of failure causes and the diversity of frequency scales, a composite-scalevariable dispersion entropy (CSvDE) theory is developed and further serves as the effective features for bearing fault diagnosis. Compared with the classical DE approach, the proposed CSvDE method contributes to revealing sensitive characteristics of components from different scales and, at the same time, retains all important information that is essential for accurate diagnosis. The computation principle of CSvDE can be illustrated in detail as follows. Suppose , · · · , c i (N)} represents the obtained i-th frequency component and N represents the length of a component. Some coarse graining sequences can be generated as follows: where ζ is the scale factor and w is the order number of coarse graining sequences. To sum up, the CSvDE of component c i (t) can be finally calculated by the following formula: w is the average relative frequency of mode Π in the sequence c (ζ) w .

The Implementation of the Proposed Fault Diagnosis Method
In this paper, a novel fault diagnosis method is proposed for the rolling bearing of a combine harvester. Based on some improved models, including SoVMD and CSvDE, a multiscale fault feature space (MsFFS) can be constructed and the purpose of accurate fault diagnosis can be further achieved. This section illustrates the strategy of MsFFS construction and the detailed implementation procedure of the proposed fault diagnosis method.

The Construction of Multiscale Fault Features Space
Considering the high similarity and complexity between different types of bearing fault signals, it is difficult to obtain accurate diagnosis results only depending on the CsvDE values of raw signal with a single scale factor. For this, a high-dimensional feature pool, written as the MsFFS, can be innovatively constructed to characterize the essential information of fault categories from the perspective of various scales. To give more detail, based on the obtained multiscale frequency components using the SoVMD algorithm, the CSvDE features of these components under different scale factors need to be acquired so that the MsFFS of signal samples can be further established. The generated MsFFS can be denoted as follows, which systematically couples different scales from the levels of the frequency component and dispersion entropy: . . .
where S and K are the number of samples and frequency components, respectively, and L is the maximum of factor ζ. It can be found from Equation (21) that the dimensionality of the constructed MsFFS is S × (KL). In addition, from Equation (21), it can be seen that there are four parameters that need to be considered in CSvDE and MsFFS, including the embedding dimension γ, the number of category m, the time delay ξ, and the scale factor ζ. Meanwhile, comparing Equations (18) and (21), we can observe that there are three common parameters in DE and CSvDE, i.e., γ, m, and ξ. Based on the principle of DE described in Reference [35], the appropriate value of embedding dimension γ is of great significance to sensitively identify the dynamic changes in the original signal. In other words, too small a γ increases the difficulties for the identification of signal dynamic changes inevitably, while too large a γ easily results in a sluggish response to minor changes. Moreover, the value of category number m should be larger than 1 to guarantee enough dispersion modes in the MsFFS. When it is too small, two amplitude values that are far from each other may be classified into a similar category. But when it is too large, a very small difference may change their category, and the results of MsFFS are easily affected by noise. Also, if γ or m is too large, the computation time is very high. Thus, it is recommended to choose m from 4 to 9 [35,36]. It should be noted that the number of potential dispersion modes m γ needs to satisfy the following condition: m γ ≤ N (N is the length of frequency component c i ) [35,36]. For the time delay ξ, it is suggested that the corresponding value should be set as 1 [35]. If ξ > 1, some important frequency information may be discarded. Consequently, referring to the literature [35,36], these three parameters can be set in this study as γ = 4, m = 6, and ξ = 1. And for the important parameter ζ in the theoretical framework of CSvDE, too small a value is not sufficient to capture essential differences between different types of fault samples and too large a value significantly increases the computation cost. A further analysis of the influence of scale factor ζ on the diagnosis results is shown later in the next section.

The Procedure of the Proposed Method
The flowchart of the proposed fault diagnosis method for rolling bearings of combine harvesters is presented in Figure 3. More specifically, due to the strong parameter selfoptimization ability and superior decomposition performance of the SoVMD described in Section 2.1, it can be used to decompose the fault signals, and then the MsFFS of bearings is constructed based on the developed CSvDE theory. The implementation procedure of the proposed method is summarized as follows.
(1) The vibration signals of harvester rolling bearings are collected by relevant sensors and the data acquisition system. (2) The collected vibration signals need to be classified into two parts at random, including the training set and testing set. For this, the cross-entropy function is considered to calculate the corresponding fault identification loss as [40,41].
where i x represents the raw signal sample, i l represents the corresponding truth la bel, i l  represents the predicted label of the softmax model, n is the number of samples C is the number of fault categories, i x c P  represents the corresponding probability o sample i x belonging to fault category c , and l G represents the softmax classifier.
(6) A testing set is utilized to validate the feasibility and superiority of the proposed di agnosis method.

Multiscale fault features space construction
MsFFS: where x i represents the raw signal sample, l i represents the corresponding truth label, l i represents the predicted label of the softmax model, n is the number of samples, C is the number of fault categories, P x i →c represents the corresponding probability of sample x i belonging to fault category c, and G l represents the softmax classifier.
(6) A testing set is utilized to validate the feasibility and superiority of the proposed diagnosis method.
Through the above diagnosis steps, the specific fault state of a harvester rolling bearing can thus be identified and determined. In essence, the fault diagnosis results provided by the proposed method are equivalent to the classification results of the softmax classifier. In addition, based on the diagnosis results, two commonly used evaluation metrics are calculated to analyze the performance of diagnosis method, including diagnosis accuracy and false alarm rate (FPR) [42,43]. Based on the machine learning theory related to the classification problem, the definitions of these two metrics are presented as follows.
where C is the number of fault categories; TP i , TN i , and FP i represent the true positives, true negatives, and false positives for the i-th fault category, respectively; and M is the total number of testing samples. The larger accuracy and the smaller FPR represent the better performance of the diagnosis method.

Dataset Description
In this study, the vibration signals collected from rolling bearings that are installed on the threshing drum assembly of the combine harvester can be utilized for experimental analysis. The structure diagram of the test platform is depicted in Figure 4, which mainly consists of a motor, a torque transducer, a drum assembly, and a signal acquisition system. As shown in Figure 4, the acceleration sensor is attached to the vertical direction at the front end of the drum assembly and utilized to acquire the vibration signals of rolling bearings under different operation conditions with a sampling frequency of 10 kHz. Specifically, the faults of different types and defect diameters are seeded on the normal bearings by edM, including three single point faults and one combination fault, as shown in Figure 5. More detailed information about the bearing fault states in this experiment can be found in Table 1, in which the abbreviation of each state is defined for clarity. Furthermore, it should be noted that each experiment sample consists of 2000 data points. The time-domain waveforms of raw vibration signals for five states are presented in Figure 6. To validate the stable diagnosis performance of the proposed method, 10 repeated trials with the same setup are conducted in this case study. Notably, all experiments are implemented with MATLAB 2016 and the relevant program runs on a laptop with a CPU 3.2 GHz and 16 GB RAM.  Through the above diagnosis steps, the specific fault state of a harvester rolling bearing can thus be identified and determined. In essence, the fault diagnosis results provided by the proposed method are equivalent to the classification results of the softmax classifier. In addition, based on the diagnosis results, two commonly used evaluation metrics are calculated to analyze the performance of diagnosis method, including diagnosis accuracy and false alarm rate (FPR) [42,43]. Based on the machine learning theory related to the classification problem, the definitions of these two metrics are presented as follows.
where C is the number of fault categories;

Dataset Description
In this study, the vibration signals collected from rolling bearings that are installed on the threshing drum assembly of the combine harvester can be utilized for experimental analysis. The structure diagram of the test platform is depicted in Figure 4, which mainly consists of a motor, a torque transducer, a drum assembly, and a signal acquisition system. As shown in Figure 4, the acceleration sensor is attached to the vertical direction at the front end of the drum assembly and utilized to acquire the vibration signals of rolling bearings under different operation conditions with a sampling frequency of 10 kHz. Specifically, the faults of different types and defect diameters are seeded on the normal bearings by edM, including three single point faults and one combination fault, as shown in Figure 5. More detailed information about the bearing fault states in this experiment can be found in Table 1, in which the abbreviation of each state is defined for clarity. Furthermore, it should be noted that each experiment sample consists of 2000 data points. The time-domain waveforms of raw vibration signals for five states are presented in Figure 6.

Multiscale Frequency Components Extraction by the SoVMD Algorithm
As the strong non-stationarity and nonlinearity of bearing vibration signals, the fault diagnosis accuracy would be obviously reduced if the process of feature extraction is directly executed using the entropy-based approach. In order to reduce the influence of signals complexity on diagnosis accuracy, as mentioned in Section 2.1, the collected signals should be decomposed first to learn the inherent characteristics of fault states from different frequency scales. Based on the developed SoVMD method, a series of multiscale frequency components c i (t) can be effectively obtained from raw signals. Specifically, the parameters related to the searching process are initialized as M = 100, P = 200, and SS 0 = 100. Furthermore, the searching intervals of K and α are set as [3,15] and [500, 1500], respectively. Taking a signal sample of a roller fault with a 1.2 mm defect diameter (RF_12) as an example for analysis, Figure 7 lists the extracted frequency components of this sample by SoVMD and the corresponding frequency spectrum of these components. We can observe from this figure that the fault sample is decomposed into 10 frequency components and the spectra of these components are significantly different. The decomposition results mentioned above indicate that the problem of mode mixing can be effectively overcome using the SoVMD algorithm. In a follow-up study, based on the CSvDE theory, the obtained frequency components can be utilized to construct the MsFFS for bearing fault identification.

Multiscale Frequency Components Extraction by the SoVMD Algorithm
As the strong non-stationarity and nonlinearity of bearing vibration signals, the fa diagnosis accuracy would be obviously reduced if the process of feature extraction is rectly executed using the entropy-based approach. In order to reduce the influence of s nals complexity on diagnosis accuracy, as mentioned in Section 2.1, the collected sign should be decomposed first to learn the inherent characteristics of fault states from diff ent frequency scales. Based on the developed SoVMD method, a series of multiscale f quency components   i c t can be effectively obtained from raw signals. Specifically, t parameters related to the searching process are initialized as , respectively. Taking a signal sample of a roller fault with a 1.2 mm defect ameter (RF_12) as an example for analysis, Figure 7 lists the extracted frequency comp nents of this sample by SoVMD and the corresponding frequency spectrum of these co ponents. We can observe from this figure that the fault sample is decomposed into 10 f quency components and the spectra of these components are significantly different. T decomposition results mentioned above indicate that the problem of mode mixing can effectively overcome using the SoVMD algorithm. In a follow-up study, based on t CSvDE theory, the obtained frequency components can be utilized to construct the MsF for bearing fault identification.

Analysis of CSvDE Scale Factor
As depicted in Equations (13) and (14), the maximum value of scale factor  shou be determined preliminarily to construct the MsFFS of bearings for accurate fa

Analysis of CSvDE Scale Factor
As depicted in Equations (13) and (14), the maximum value of scale factor ζ should be determined preliminarily to construct the MsFFS of bearings for accurate fault diagnosis. For this, Figure 8 presents the development trend of CSvDE average values for all training samples under 12 operation states with the increase in scale factor, in which the scale factor varies from 1 to 20. More specifically, the remaining three parameters of CSvDE can be set as γ = 4, m = 6, and ξ = 1 [35,36]. It can be found from the figure that the values of CSvDE show a gradual decreasing tendency with the scale factor increasing, regardless of the working state. Moreover, when ζ > 15, the curve of CSvDE tends to be stable and there is obvious overlap between the CSvDE values of different fault states, which indicates that the current value of ζ is appropriate for fault identification. In other words, the interval of the scale factor for the construction of the MsFFS can be determined as ζ ∈ [1,15].
CSvDE can be set as , , and [35,36]. It can be found from the figure that the values of CSvDE show a gradual decreasing tendency with the scale factor increasing, regardless of the working state. Moreover, when 15   , the curve of CSvDE tends to be stable and there is obvious overlap between the CSvDE values of different fault states, which indicates that the current value of  is appropriate for fault identification. In other words, the interval of the scale factor for the construction of the MsFFS can be

 
, and a similar phenomenon occurs in the results of the other three factors. This is because the last few components with high frequency exhibit a stronger randomness and contain more information of fault states compared with the first several components. Moreover, it is worth noting that for the 12 states, the differences in CSvDE values of the last five components are more significant. However, there is obvious overlap between the CSvDE values of the other components. The relevant results indicate that the last five components show greater potential for identifying different operation states of bearings. Consequently, two groups of experiments can be conducted in this study, as described in Table 2. In other words, from the perspective of feature construction, we try to explore the influence of MsFFS structure on the diagnosis accuracy. And most remarkably, much more a ention should be paid to Experiment 1 to demonstrate the superior performance of the proposed method. Experiment 1: Without the process of frequency component selection, all components obtained by SoVMD are used to construct the MsFFS. Experiment 2: Based on the analysis results mentioned above, the last five components are considered to construct the MsFFS. In addition, to analyze the difference of CSvDE features of frequency components between 12 working states, the CSvDE values of all components under four different scale factors (ζ = 1, 5, 10, 15) are calculated, as shown in Figure 9. It is obvious that the CSvDE values of the last few components are larger than those of the first several components when ζ = 1, and a similar phenomenon occurs in the results of the other three factors. This is because the last few components with high frequency exhibit a stronger randomness and contain more information of fault states compared with the first several components. Moreover, it is worth noting that for the 12 states, the differences in CSvDE values of the last five components are more significant. However, there is obvious overlap between the CSvDE values of the other components. The relevant results indicate that the last five components show greater potential for identifying different operation states of bearings. Consequently, two groups of experiments can be conducted in this study, as described in Table 2. In other words, from the perspective of feature construction, we try to explore the influence of MsFFS structure on the diagnosis accuracy. And most remarkably, much more attention should be paid to Experiment 1 to demonstrate the superior performance of the proposed method.  The last five components are adopted to construct the MsFFS 600 × 75 600 × 75  As depicted in Equation (21), the MsFFS can be effectively constructed for bearing fault diagnosis based on the extracted multiscale frequency components and CSvDE theory. Here, utilizing the results of the training set in Experiment 1 as an example, the following formula shows the constructed MsFFS to train the softmax classifier: (CSvDE(c 1 , 4, 6, 1, 1)) Nora 1 (CSvDE(c 1 , 4, 6, 1, 2)) · · · Nora 1 (CSvDE(c 1 , 4, 6, 1, 15)) · · · Nora 1 (CSvDE(c 10 , 4, 6, 1, 15)) . . .
Considering the two different groups of experiments, the detailed diagnosis accuracies and FPRs of 10 trials for seven methods are presented in Figures 10 and 11, respectively. Based on these results, the average accuracies and average false alarm rates of the seven methods in two experiments can be calculated and are listed in Table 4. In addition, to compare the implementation efficiency, the average computation time of 10 trials for different methods is shown in Table 5. Table 3. Descriptions about the parameter setup of seven methods in Experiment 1.

Parameter Setup
The proposed method For SoVMD, the maximum number of iterations is 1000, and the number of components and the penalty factor are optimized between [3,15] and [500, 1500], respectively. For CSvDE, the embedding dimension, category number, and time delay are set as 4, 6, and 1, respectively, and the scale factor varies between 1 and 15.

EMD-CSvDE
For EMD, the maximum number of iterations is 1000. For CSvDE, the corresponding parameters are set as the same as that of the proposed method.

VMD-CSvDE
For VMD, the maximum number of iterations is 1000, and the number of components and the penalty factor are arbitrarily determined as 7 and 1000, respectively. For CSvDE, the corresponding parameters are set as the same as that of the proposed method.

SoVMD-MSE
For SoVMD, the corresponding parameters are set as the same as that of the proposed method. For MSE, the embedding dimension and time delay are set as 4 and 1, respectively, and the scale factor varies between 1 and 15.

SoVMD-MPE
For SoVMD, the corresponding parameters are set as the same as that of the proposed method. For MPE, the embedding dimension and time delay are set as 4 and 1, respectively, and the scale factor varies between 1 and 15.

SVM
The RBF is used as the kernel function. The penalty factor is set as 3, and the kernel radius is set as 1.

ANN
The structure of the network is 2000-300-12. The learning rate and momentum are 0.1 and 0.3, respectively, and the maximum number of iterations is 1000. methods in Experiment 1 are shown in Table 3. Table 3. Descriptions about the parameter setup of seven methods in Experiment 1.

Parameter setup
The proposed method For SoVMD, the maximum number of iterations is 1000, and the number of components and the penalty factor are optimized between [3,15] and [500, 1500], respectively. For CSvDE, the embedding dimension, category number, and time delay are set as 4, 6, and 1, respectively, and the scale factor varies between 1 and 15.

EMD-CSvDE
For EMD, the maximum number of iterations is 1000. For CSvDE, the corresponding parameters are set as the same as that of the proposed method.

VMD-CSvDE
For VMD, the maximum number of iterations is 1000, and the number of components and the penalty factor are arbitrarily determined as 7 and 1000, respectively. For CSvDE, the corresponding parameters are set as the same as that of the proposed method.

SoVMD-MSE
For SoVMD, the corresponding parameters are set as the same as that of the proposed method. For MSE, the embedding dimension and time delay are set as 4 and 1, respectively, and the scale factor varies between 1 and 15.

SoVMD-MPE
For SoVMD, the corresponding parameters are set as the same as that of the proposed method. For MPE, the embedding dimension and time delay are set as 4 and 1, respectively, and the scale factor varies between 1 and 15.

SVM
The RBF is used as the kernel function. The penalty factor is set as 3, and the kernel radius is set as 1.

ANN
The structure of the network is 2000-300-12. The learning rate and momentum are 0.1 and 0.3, respectively, and the maximum number of iterations is 1000.
Considering the two different groups of experiments, the detailed diagnosis accuracies and FPRs of 10 trials for seven methods are presented in Figures 10 and 11, respectively. Based on these results, the average accuracies and average false alarm rates of the seven methods in two experiments can be calculated and are listed in Table 4. In addition, to compare the implementation efficiency, the average computation time of 10 trials for different methods is shown in Table 5.   From the perspective of diagnosis accuracy and FPR of each trial, it can be seen from Figures 10 and 11 that the accuracy and FPR of the proposed method are obviously superior to those of the other six approaches for both experiments. More comprehensively, as shown in Table 4, we can observe that the average accuracy and average FPR of the proposed method in Experiment 1 are 96.70% and 0.30%, respectively, which are slightly superior to EMD-CSvDE, VMD-CSvDE, SoVMD-MSE, and SoVMD-MPE, and significantly superior to SVM and ANN. Specifically, compared with the other six approaches, the accuracy of the proposed method in Experiment 1 is improved by 4.51%, 2.08%, 4.68%, 6.23%, 55.04%, and 69.89%, respectively, and the FPR is reduced by 55.88%, 41.18%,  From the perspective of diagnosis accuracy and FPR of each trial, it can be seen from Figures 10 and 11 that the accuracy and FPR of the proposed method are obviously superior to those of the other six approaches for both experiments. More comprehensively, as shown in Table 4, we can observe that the average accuracy and average FPR of the proposed method in Experiment 1 are 96.70% and 0.30%, respectively, which are slightly superior to EMD-CSvDE, VMD-CSvDE, SoVMD-MSE, and SoVMD-MPE, and significantly superior to SVM and ANN. Specifically, compared with the other six approaches, the accuracy of the proposed method in Experiment 1 is improved by 4.51%, 2.08%, 4.68%, 6.23%, 55.04%, and 69.89%, respectively, and the FPR is reduced by 55.88%, 41.18%, 57.14%, 63.41%, 91.60%, and 92.4%, respectively. Similar results also appear in Experiment 2 depicted in Table 4. Meanwhile, the standard deviations of the developed method for these two metrics are obviously smaller than those of the other approaches in any group of experiments, which confirms the stronger stability and robustness of the proposed method for bearing fault diagnosis. In addition, from the results presented in Table 5, it can be found that the average computation time of the proposed method is slightly more than those of EMD-CSvDE, VMD-CSvDE, SoVMD-MSE, and SoVMD-MPE, while it is much more than those of SVM and ANN, regardless of Experiment 1 or Experiment 2. Compared with the other four combined methods, the process of adaptive parameter optimization and MsFFS construction with variable scale factors in the framework of the proposed method will take more time to improve diagnosis accuracy. In addition, without the consideration of the strategies of signal decomposition and MsFFS construction, the computational costs of SVM and ANN will be reduced compared to those of the other five diagnosis methods.
To give more details, the fault diagnosis results of the proposed method for the sixth trial in Experiment 1 and the corresponding multi-class confusion matrix are shown in Figures 12 and 13, respectively. In Figure 12, it can be seen clearly that a small number of predicted labels of testing samples deviate from the true labels, i.e., the phenomenon of misdiagnosis. More specifically, to intuitively reflect the accuracy rate and error rate, the multi-class confusion matrix can be further built based on the above-mentioned diagnosis results, as depicted in Figure 13. It can be observed from this figure that the diagnosis accuracy of different operation states can reach 90% or even higher, especially for six states (Nora, RF_07, IRF_07, IRF_12, ORF_07, and ORF_15) with an accuracy of 100%. Moreover, an overall accuracy of 97% can be achieved by the proposed method for the sixth trial in Experiment 1, which indicates that the proposed method contributes to identify the different fault types and defect severities of the rolling bearing and also realizes satisfactory diagnosis accuracy as a whole.  Table 4. Meanwhile, the standard deviations of the developed method fo these two metrics are obviously smaller than those of the other approaches in any grou of experiments, which confirms the stronger stability and robustness of the propose method for bearing fault diagnosis. In addition, from the results presented in Table 5, can be found that the average computation time of the proposed method is slightly mor than those of EMD-CSvDE, VMD-CSvDE, SoVMD-MSE, and SoVMD-MPE, while it much more than those of SVM and ANN, regardless of Experiment 1 or Experiment 2 Compared with the other four combined methods, the process of adaptive parameter op timization and MsFFS construction with variable scale factors in the framework of th proposed method will take more time to improve diagnosis accuracy. In addition, withou the consideration of the strategies of signal decomposition and MsFFS construction, th computational costs of SVM and ANN will be reduced compared to those of the other fiv diagnosis methods.
To give more details, the fault diagnosis results of the proposed method for the sixt trial in Experiment 1 and the corresponding multi-class confusion matrix are shown i Figures 12 and 13, respectively. In Figure 12, it can be seen clearly that a small number o predicted labels of testing samples deviate from the true labels, i.e., the phenomenon o misdiagnosis. More specifically, to intuitively reflect the accuracy rate and error rate, th multi-class confusion matrix can be further built based on the above-mentioned diagnos results, as depicted in Figure 13. It can be observed from this figure that the diagnos accuracy of different operation states can reach 90% or even higher, especially for six state (Nora, RF_07, IRF_07, IRF_12, ORF_07, and ORF_15) with an accuracy of 100%. Moreove an overall accuracy of 97% can be achieved by the proposed method for the sixth trial i Experiment 1, which indicates that the proposed method contributes to identify the di ferent fault types and defect severities of the rolling bearing and also realizes satisfactor diagnosis accuracy as a whole.  Through the above experiment results, the relevant conclusions can be summarized as follows. (1) Among all the diagnosis approaches, the proposed method realizes the highest diagnosis accuracy and the lowest FPR whether in Experiment 1 or Experiment 2, which strongly confirms its superior performance on bearing fault diagnosis. (2) Compared with the last two approaches (SVM and ANN), the remaining methods (the proposed method, EMD-CSvDE, VMD-CSvDE, SoVMD-MSE, and SoVMD-MPE) can accomplish the task of fault diagnosis with higher accuracy and lower FPR. The main reason is that the multiscale frequency component extraction by different decomposition algorithms contributes to capture the inherent characteristics of the raw signal and further establish an effective feature space for accurate diagnosis. (3) Adopting the SoVMD algorithm, the diagnosis accuracy and FPR of the developed method can be significantly improved compared with EMD-CSvDE and VMD-CSvDE. This is because the optimal parameters of the decomposition process can be adaptively determined by the SoVMD method so that the multiscale frequency components can be effectively obtained without the influence of the mode mixing problem. (4) From the perspective of feature space construction, the diagnosis performance of the proposed method is more excellent and stable than the other two approaches are, including SoVMD-MSE and SoVMD-MPE. Because of the variable parameters of scale factor, the developed CSvDE method is helpful to construct the MsFFS more effectively and improve the diagnosis performance compared with the methods of MSE and MPE. (5) For the same method, the diagnosis results in Experiment 1 are slightly superior to those in Experiment 2, but it is worth noting that the calculation time in Experiment 1 is obviously more than that in Experiment 2. This is because a small amount of fault information can still be contained in the first five frequency components and may be useful for accurate diagnosis. Taking the fewer components into account, the computation costs of Experiment 2 can be decreased significantly.
Entropy 2023, 25, x FOR PEER REVIEW 19 of Figure 13. The multi-class confusion matrix of the proposed method for the 6th trial in Experim 1.
Through the above experiment results, the relevant conclusions can be summariz as follows. (1) Among all the diagnosis approaches, the proposed method realizes highest diagnosis accuracy and the lowest FPR whether in Experiment 1 or Experimen which strongly confirms its superior performance on bearing fault diagnosis. (2) Co pared with the last two approaches (SVM and ANN), the remaining methods (the p posed method, EMD-CSvDE, VMD-CSvDE, SoVMD-MSE, and SoVMD-MPE) can acco plish the task of fault diagnosis with higher accuracy and lower FPR. The main reason that the multiscale frequency component extraction by different decomposition alg rithms contributes to capture the inherent characteristics of the raw signal and furth establish an effective feature space for accurate diagnosis. (3) Adopting the SoVMD al rithm, the diagnosis accuracy and FPR of the developed method can be significantly i proved compared with EMD-CSvDE and VMD-CSvDE. This is because the optimal p rameters of the decomposition process can be adaptively determined by the SoVM method so that the multiscale frequency components can be effectively obtained witho the influence of the mode mixing problem. (4) From the perspective of feature space co struction, the diagnosis performance of the proposed method is more excellent and sta than the other two approaches are, including SoVMD-MSE and SoVMD-MPE. Because the variable parameters of scale factor, the developed CSvDE method is helpful to co Finally, to intuitively illustrate the superior performance on feature extraction of the proposed method, the extracted MsFFS can be reduced and visualized by the t-distributed stochastic neighbor embedding (t-SNE) algorithm. We use the results of the sixth trial in Experiment 1 as an example for analysis, and Figure 14 shows the three-dimensional projections of the original samples and the MsFFS by t-SNE. It is obvious that the MsFFS obtained by the proposed method can effectively reveal essential information contained in the original samples and accomplish the fault state identification with high accuracy. The main reason is that the complex nonlinear relationships between the raw signal and the MsFFS can be constructed effectively based on the model architecture integrating the component mode with a variable scale factor. To sum up, the proposed method can achieve superior performance in capturing valuable features for accurate fault diagnosis.

Conclusions
In this paper, a novel fault diagnosis method based on CSvDE and SoVMD, systematically integrating the nonstationary signal analysis method with machine learning technology, is proposed for rolling bearings of combine harvesters. Within the developed method, to solve the problems existing in traditional EMD and VMD approaches, a SoVMD algorithm is first designed to extract the multiscale frequency components from the raw signal sample. In essence, an adaptive parameter optimization method is presented in the theorical framework of SoVMD to conduct the search of VMD parameters with high efficiency. Compared with EMD, the mode mixing problem can be effectively tackled by the SoVMD method. Subsequently, an entropy-based feature construction theory, i.e., CSvDE, is presented to establish the MsFFS for fault diagnosis. Theoretically, the developed CSvDE fully blends the advantages of the variable scale of the parameter and DE. Compared with other entropies, such as MSE and MPE, the advantages of CSvDE result in more accurate and stable results. The results of a case study of the rolling bearing datasets of combine harvesters show that the proposed method has a more excellent and robust diagnosis performance than other existing approaches. Nevertheless, the computation consumption is relatively high; thus, decreasing the time cost and, at the same time, guaranteeing satisfactory accuracy are still valuable topics to be further explored in the future.

Conclusions
In this paper, a novel fault diagnosis method based on CSvDE and SoVMD, systematically integrating the nonstationary signal analysis method with machine learning technology, is proposed for rolling bearings of combine harvesters. Within the developed method, to solve the problems existing in traditional EMD and VMD approaches, a SoVMD algorithm is first designed to extract the multiscale frequency components from the raw signal sample. In essence, an adaptive parameter optimization method is presented in the theorical framework of SoVMD to conduct the search of VMD parameters with high efficiency. Compared with EMD, the mode mixing problem can be effectively tackled by the SoVMD method. Subsequently, an entropy-based feature construction theory, i.e., CSvDE, is presented to establish the MsFFS for fault diagnosis. Theoretically, the developed CSvDE fully blends the advantages of the variable scale of the parameter and DE. Compared with other entropies, such as MSE and MPE, the advantages of CSvDE result in more accurate and stable results. The results of a case study of the rolling bearing datasets of combine harvesters show that the proposed method has a more excellent and robust diagnosis performance than other existing approaches. Nevertheless, the computation consumption is relatively high; thus, decreasing the time cost and, at the same time, guaranteeing satisfactory accuracy are still valuable topics to be further explored in the future.